Jupyter notebooks

How does it work?

The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.

<img src= "images/02_Jupyter the inner workings/Notebook sample breast cancer.png", width=100%>

The Jupyter project

Jupyter is an open source project

  • Written in python
  • Kernels enable connectivity to other languages

Dependencies

  • The kernel and target languages communicates using ZeroMQ message queuing engine

Installation

  • An easy way to get Jupyter is to install the Anaconda distribution

ZeroMQ

How does Jupyter work?

<img src= "images/02_Jupyter the inner workings/jupyter architecture 1.png", width=100%>

How does Jupyter work with R?

<img src= "images/02_Jupyter the inner workings/jupyter architecture 2.png", width=100%>

IRKernel

  • IRKernel is the most widely used kernel
  • This communicates between R and ZeroMQ

Installing Jupyter

Several kernels exist for R. The most widely used is IRKernel

Installation of the IRKernel


In [1]:
# install.packages(
#  c('rzmq','repr','IRkernel','IRdisplay'), 
#  repos = c('http://irkernel.github.io/', 
#          getOption('repos'))
#)

Connect R to IRKernel


In [2]:
# IRkernel::installspec()

Jupyter is a single user environment

  • By default, Jupyter only exposes localhost
  • It is possible to open SSH or HTTPS access to a remote machine
  • However, this is only for a single user

The solution is to use Jupyter Hub

<img src= "images/02_Jupyter the inner workings/jupyter hub logo.png", width=25%>

Jupyter Hub

<img src= "images/02_Jupyter the inner workings/jupyter hub.png", width=100%>

Jupyter Hub

A multi-user server

  • manages and proxies multiple instances of the single-user IPython Jupyter notebook server.

Three actors:

  • multi-user Hub (tornado process)
  • configurable http proxy (node-http-proxy)
  • multiple single-user IPython notebook servers (Python/IPython/tornado)

Basic principles:

  • Hub spawns proxy
  • Proxy forwards ~all requests to hub by default
  • Hub handles login, and spawns single-user servers on demand
  • Hub configures proxy to forward url prefixes to single-user servers

How we manage this workshop

  • Create a Data Science virtual machine on Azure

    • Select the Linux version
    • This contains Jupyter Notebooks as well as Jupyter Hub
  • Create content and publish on github

  • git pull content to VM

  • Create users

  • Start Juptyer Hub